Combine Searches
https://gyazo.com/763fb9ae8f7e017772460b9ef471e515
Talk about combining searches.
DPR is, in a crude way, vector search, and BM25 is, in a crude way, "modern TF-IDF. Vector search is weak for low-frequency words (= proper nouns, technical terms and product names), so combine ordinary search
BM25: Lexical Match Base
Strong semantic similarity
Missing low-frequency words / Significant performance degradation outside the distribution (OOD)
Very small overlap in search results for both DPR and BM25
→ good points on both sides.
Top 100 items
Surveying papers around information retrieval, we found several studies in this context, and seq-to-seq based ones are more accurate than building classifiers on BERT basis
---
This page is auto-translated from /nishio/検索を組み合わせる using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.